AITopics | network architecture

Collaborating Authors

network architecture

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A numerical study into neural network surrogate model performance for uncertainty propagation

Wade, Noah, Teferra, Kirubel

arXiv.org Machine LearningMay-18-2026

Neural network surrogate models have emerged as a promising approach to model solution fields for a wide variety of boundary value problems encountered in physical modeling. Stochastic problems represent an area of particularly high interest because of the potential to significantly reduce the repeated evaluation of expensive forward models via traditional numerical solvers when conducting parametric analysis. However, many studies found in the literature primarily focus on the ability of neural network surrogate models to represent deterministic samples or mean field solutions and largely overlook surrogate model performance at the tails of the distribution. The present study examines in detail the ability of neural network surrogate models to capture the full distribution of solution fields over the entire probability space, while emphasis is placed at the tails of the distribution. Serving as a canonical problem is the heat conduction equation with a highly stochastic source term, inducing extremely large variation in the thermal solution field. Comparisons are made between a classic feed-forward fully connected network and a Deep Operator Network architecture, using both data-driven and physics-informed loss functions. Results show that the worst-case prediction errors are an order of magnitude larger than the mean field error, highlighting the importance of the outlier samples. The large errors associated with extreme samples result from the networks having to extrapolate beyond the bounds of the training data. A method for identifying these samples is presented along with a discussion of potential approaches to account of their errors. Among the models considered, the fully connected neural network trained using a weak form residual loss performs best in handling these extrapolated inputs, achieving the highest prediction accuracy for the numerically produced datasets.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

doi: 10.1061/JENMDT/EMENG-8978

2605.16078

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.69)

Industry: Government > Military > Navy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

de670b9d118229d09d9a9bd9dec2598b-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-30-2026, 00:38:04 GMT

data mining, design choice, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.68)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(6 more...)

Add feedback

Details

Neural Information Processing SystemsApr-27-2026, 10:23:28 GMT

A.1 Difference between the performance of two joint policies In Section 3.1, the difference between the performance of two joint policies is expressed as follows: The proof is a multi-agent version of the proof in (Kakade and Langford, 2002). Now we provide the mathematical detail formally. A.2 Approximation that matches the true value to first order In Section 3.1, we claim that Jπ( π) matches J( π) to first order. Intuitively, this means that a sufficiently small update of the joint policy which improves Jπ( π) will also improve J( π). Now we prove it formally.

agent, artificial intelligence, section 3, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)

Add feedback

Appendix ANetwork Architectures

Neural Information Processing SystemsApr-25-2026, 21:58:26 GMT

In this section, we describe the details of the network architectures used in Sec. 4 and 5. We mainly used 4 GPUs (NVIDIAV100; 16GB) for the experiments in Sec. 4 and 5 and it took about 4 hours per seed (in the case of 3M steps). Actually, we conducted exhaustive evaluations through the enormous experiments, and we hope our empirical observations and recommendations help the practitioners to explore the explosive configuration space. Adam Adam Learning rate (policy) 1e-4 5e-5 3e-4 3e-4 Learning rate (value) 1e-4 1e-2 3e-4 3e-4 Weight initialization Uniform Xavier Uniform Xavier Uniform Xavier Uniform Initial output scale (policy) 1.0 1e-4 1e-2 1e-2 Target update Hard - Soft (5e-3) Soft (5e-3) Clipped Double QFalse - True True Table 7: Details of each network architecture. We refer the original implementations of each algorithm which is available online [23, 14, 48, 27, 42].

artificial intelligence, machine learning, training step, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

The proposition makes use of the following observation: For the discriminator defined in (1), the norm of gradient for wt is upper bounded by k wtDθ(x)k F kxk LY

Neural Information Processing SystemsApr-25-2026, 21:09:05 GMT

The upper bound of gradient's Frobenius norm for spectrally-normalized discriminators follows directly. As lw(x) is a linear transformation, we have lcw(x) = c lw(x), and lw(cx) = c lw(x). Moreover, since ReLU and leaky ReLU is linear in R+ and R region, we have ai(cx) = c ai(x). In this section we discuss the gradients with respect the actual parameter wi. From Eq. (12) in [30] we know wtDθ(x) = A, we know that w0tDθ(x) F, otl(x)Dθ(x), and kotl (x)k have upper bounds. From Theorem 1.1 in [44] we know that if wt is initialized with i.i.d random variables from uniform or Gaussian distribution, E kwtkspis lower bounded away from zero at initialization. So k wtDθ(x)kF is upper bounded at initialization. Moreover, we observe empirically that kwtksp is usually increasing during training. Therefore, k wtDθ(x)kF is typically upper bounded during training as well. The following proposition states that spectral normalization also gives an upper bound on kHwi(Dθ)(x)ksp for networks with ReLU or leaky ReLU internal activations.

artificial intelligence, experiment, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix

Neural Information Processing SystemsApr-25-2026, 16:05:09 GMT

We have shown experimentally that our method is effective in a variety of domains; however, other problem domains may require additional hyperparameter tuning, which can be expensive.

artificial intelligence, linear, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds

Neural Information Processing SystemsApr-25-2026, 14:56:13 GMT

Machine learning for point clouds has been attracting much attention, with many applications in various fields, such as shape recognition and material science. For enhancing the accuracy of such machine learning methods, it is often effective to incorporate global topological features, which are typically extracted by persistent homology. In the calculation of persistent homology for a point cloud, we choose a filtration for the point cloud, an increasing sequence of spaces. Since the performance of machine learning methods combined with persistent homology is highly affected by the choice of a filtration, we need to tune it depending on data and tasks. In this paper, we propose a framework that learns a filtration adaptively with the use of neural networks. In order to make the resulting persistent homology isometry-invariant, we develop a neural network architecture with such invariance. Additionally, we show a theoretical result on a finite-dimensional approximation of filtration functions, which justifies the proposed network architecture. Experimental results demonstrated the efficacy of our framework in several classification tasks.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > Japan > Honshū (0.14)

Genre: Research Report (0.68)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Network Architecture Beyond Width and Depth

Neural Information Processing SystemsApr-25-2026, 03:11:50 GMT

This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth. Neural network architectures with height, width, and depth as hyper-parameters are called three-dimensional architectures. It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures (those with only width and depth as hyper-parameters), e.g., standard fully connected networks. The new network architecture is constructed recursively via a nested structure, and hence we call a network with the new architecture nested network (NestNet). ANestNet of height sis built with each hidden neuron activated by a NestNet of height s 1.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: